Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
نویسندگان
چکیده
ÐThis paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads of the same memory element. Two variants of the algorithm are considered: One allows partially concurrent reads (PCR) and the other allows fully concurrent reads (FCR). Analyses of their time complexities derive a necessary condition with respect to the iteration workload for runtime parallelization. Experimental results for a Gaussian elimination loop, as well as an extensive set of synthetic loops on a 12-way SMP server, show that the time stamp algorithms outperform iterationlevel parallelization techniques in most test cases and gain speedups over sequential execution for loops that have heavy iteration workloads. The PCR algorithm performs best because it makes a better trade-off between maximizing the parallelism and minimizing the analysis overhead. For loops with light or unknown iteration loads, an alternative speculative runtime parallelization technique is preferred. Index TermsÐCompiler, parallelizing compiler, runtime support, inspector-executor, doacross loop, dynamic dependence.
منابع مشابه
A Practical Approach to DOACROSS Parallelization
Loops with cross-iteration dependences (DOACROSS loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on DOALL loops, and consider DOACROSS parallelism to be impractical due to the space inefficiencies and the synchronization overheads of ...
متن کاملEffects of Parallelism Degree on Run-Time Parallelization of Loops
Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic crossiteration dependences. The DOALL approach exploits iterationlevel paralleli...
متن کاملAn Empirical Study on DOACROSS Loops
Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In this study, we selected 6 large application programs and used an execution-driven simulation technique from MaxPar 5] to identify and to measure the eeectiveness of concurrent DOACROSS loops execution. It was found that executing DOACROSS loops serial...
متن کاملEXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations
Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very diicult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation sche...
متن کاملOn Effective Execution of Nonuniform DOACROSS Loops
It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 12 شماره
صفحات -
تاریخ انتشار 2001